Add robust token counter with 0 default on failure for ollama_chat #7380
+22
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title
This is a fix or the latest ollama implementing function calling properly, which now returns a json object in messages. The current litellm code will try to pass that to token counter which tries to concatenate it with text and fail.
I could not find where this code is tested so please advise on that, if needed.
Instead fixing this in token_counter this PR:
Since returning 0 tokens in the extreme failure scenario (both ollama and token_counter failing) is potentially expensive for customers, let me know if you want that case to hard fail instead, as previously
Relevant issues
#6958 (should be fixed in this PR)
#7094 (not addressed here explicitly, unclear how streaming json objects should behave)
Type
🐛 Bug Fix
Changes
[REQUIRED] Testing - Attach a screenshot of any new tests passing locall
If UI changes, send a screenshot/GIF of working UI fixes
llama3.1:70b-instruct-q4_K_M